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TIME  SERIES  MODEL  IDENTIFICATION 
AND  PREDICTION  VARIANCE  HORIZON 

Emanuel  Parzen 

Institute  of  Statistics 
Texas  ASM  University 


An  approach  to  time  series  modelling  is  described;  it 
classifies  the  time  series  into  one  of  three  memory 
types  (called  no  memory,  short  memory,  and  long  memory), 
and  then  finds  a  whitening  filter.  When  the  time  series 
is  short  memory  one  would  like  to  identify  the  whitening 
filter  type  as  AR,  MA,  or  ARMA  before  parameter  estim¬ 
ation.  A  new  tool  is  introduced  which  can  be  used  to 
diagnose  both  the  memory  type  of  a  ^time  series,^*  and 
the  whitening  rilter  type  of  a  short  memory  time  series. 
It  is  called  prediction  variance  horizon  function.  >anjL _ 

2  2  J 

is  defined  by  PVH(h)  =  1  -  ,  where  m  ispne  nor¬ 

malized  mean  square  prediction  error_jx£-4nFthite  memory 
prediction  h  steps  ahead. classify  the  model  type  of 
a  time  series,  one  uses  the  shape  of  PVH  and  the  value 
~  'sof  the  horizon  HOR  (defined  as  the  smallest  value  of  h 

r-rrTFor  which  PVH(h)*^j£  0.05)  The  analysis  of  a  real  time 
W...  FBeeze, 


0.  TIME  SERIES  MODEL  TYPES: 
LONG  MEMORY 


NO  MEMORY,  SHORT  MEMORY,  AND 


A  stationary  Gaussian  time  series  Y(t),  with  zero  means, 
with  covariance  function  R(v)  =  E[Y(t)Y(t+v) ]  and  correlation 
function  p(v)  =  R(v)/R(0)  can  be  represented  in  general 

Y(t)  =  S(t)  +  Z(t)  , 

S(t)  -  T.  {A.  cos  —  t  +  B.  sin  t)  , 

j  3  Tj  2  j 

Z (t)  -  l  B.e(t-k) 
k=0  K 


*  Supported  in  part  by  the  Office  of  Naval  Research,  Con.  M00014-7R-C-0S99. 
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for  suitable  periods  ij ,  constants  3^,  and  uncorrelated  random 
variables  A^  ,  Bj ,  e(t)  .  Thus  Y(t)  is  a  sum  of  a  scheme  of 

hidden  periodicities  and  an  MA(°°)  scheme.  From  a  single  real- 

) 

2 

ization  of  the  time  series  one  can  hope  to  estimate  6^  and  a  , 
the  variance  of  e(t);  however  one  can  only  estimate  the  values 
of  Aj  and  in  that  realization,  and  not  their  distribution 
as  random  variables. 


Both  S(t)  and  Z(t)  are  stationary  time  series.  When  model 
ing  Y(t)  by  an  ARMA  (p,q)  one  is  assuming  that  only  the  com¬ 
ponent  Z(t)  is  present.  The  ARIMA  model  introduced  by  Box 
and  Jenkins  (1976)  can  be  regarded  as  an  approach  to  modeling 
Y(t)  when  seasonal  or  periodic  components  S(t)  are  present 
(and  also  when  trend  terms  are  present) .  This  paper  proposes 
that  while  general  models  for  seasonal  and  trend  components 
cannot  be  defined,  one  can  develop  general  diagnostics  for 
their  presence.  Consequently,  one  is  able  to  distinguish  be¬ 
tween  stationary  time  series  like  Z(t)  which  can  be  approxi¬ 
mately  modeled  by  an  ARMA(p,q),  and  those  whose  models  require 
terms  like  S(t). 

Parzen  (1979)  ,•  (1980)  proposes .  that  the  basic  strategy  of 
time  series  model  identification  is  to  determine  a  whitening 
filter  in  such  a  way  that  time  series  decomposition  filters 
can  be  obtained  as  interpretations  of  the  whitening  filter.  A 
whitening  filter  is  one  which  transforms  the  time  series  to 
white  noise: 


Time  Series 
Y(t) 


Whitening  — »  White  noise,  denoted 
Filter  e(t)  or  Yv(t) 


It  is  closely  related  to  a  prediction  filter  which  generates 
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a  predictor,  YM(t)  since  Yv(t)  =  Y(t)  -  Yy(t): 


Time  series 
Y(t) 


Prediction 

Filter 


One-step  ahead  infinite 
memory  prediction  or 
forecast,  denoted 
YM(t)  or  Yw(t|t-1) 


To  identify  the  whitening  filter  of  a  time  series,  determine 
its  model  memory  type,  and  identify  an  iterative  model  for 
the  time  series: 


No  Memory  No  Memory 

Residuals  e  Residuals  c 


This  paper  describes  various  ways  to  define  the  three 
time  series  types,  using  (1)  correlations,  (2)  spectral  den¬ 
sities,  (3)  innovation  variances,  (4)  spectral  distribution 
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functions,  (5)  prediction  variance  horizon  function,  and  (6; 
S-PLAY  diagnostics.  The  various  approaches  do  not  lead  to 
equivalent  definitions;  they  are  intended  to  illustrate  the 
qualitative  conclusions  we  seek  to  form  about  models  obeyed 
by  an  empirical  time  series. 

Let  us  illustrate  how  the  concept  of  time  series  memory 
type  changes  our  ways  of  describing  time  series  models.  Con¬ 
sider  the  model 

g2(L)Y(t)  =  h2(L)e(t) 

1c 

where  L  is  the  lag  operator  defined  by  L  Y(t)  =  Y(t-k)  , 
g2(L)  =  I-1.97L  +  L2  ,  h2(L)  =  I-1.65L  +  . 64L2  . 

A  researcher  might  describe  this  as  an  ARMA(2,2)  model,  even 

though  g2(L)  has  roots  on  the  unit  circle;  g2(L)  is  approxi- 

2 

mately  second  differencing  I-2L  +  L  If  g2 (L)  were  in  fact 
second  differencing,  some  researchers  would  describe  the  model 
as  ARIMA(0 , 2 , 2) .  This  paper  proposes  that  more  insight  is 
obtained  by  writing  the  model  as  an  iterated  model 

g2(L)Y(t)  =  Y(t)  ,  Transform  long  memory  Y  to 

short  memory  Y  , 

Y(t)  =  h2(L)e(t)  ,  Transform  short  memory  Y  to 

no  memory  e . 

The  classification  of  a  sample  (Y(t)  ,  t  *  1,  2,  ...,  T) 
will  be  based  on  various  diagnostics,  derived  from  basic 
sample  statistics  which  constitute  a  generalized  harmonic 
analysis;  thus,  the  first  step  in  the  empirical  analysis  of  a 

sample  (Y(t) ,  t  *  1,  2 . T)  is  the  calculation  of:  (1) 

the  sample  spectral  density  function  f  (X),  -0.5  <_  X  <_  0.5 
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de fined 


f(X) 


l  Y(t)  exp(2rriAt) 
t=l _ 

l  Y2(t) 
t=l 


(2)  the  sample  spectral  distribution  function  F(A)’  0  £  A  £  0.5, 
defined  by 


A  _ 

F(A)  =  2/q  f (A ' )  dX*  , 

(3)  the  sample  correlation  function  p(v) ,  v  =  0,  1,  T-l, 

defined  by 


T-v 

l  Y(t)Y(t+v) 

p(v)  -  - 

l  Y2 (t) 
t=l 


■  /-Q55  e2”UV  f<A)dX  ■ 


Usually  the  foregoing  quantities  are  computed  for  the 

T 

mean- detrended  time  series  Y(t)-Y,  Y  =  (1/T)  \  Y(t).  When 

t=l 

the  time  series  has  a  strong  trend  component,  a  better  de¬ 
trending  procedure  is  non-stationary  subset  autoregression 
which  yields  ARARMA  models  [Parzen  (1981)] 


1.  MEMORY  TYPE  BY  CORRELATION  FUNCTIONS 


The  sample  correlation  function  p(v)  of  a  time  series  has 
the  same  mathematical  properties  as  the  correlation  function 
p(v)  of  a  zero  mean  covariance  stationary  time  series  Y(t). 
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In  terms  of  p(v)  ,  the  definition  of  the  three  time  series 
memory  types  is : 


No  Memory 

Short  Memory 

Long  Memory 

oo 

l  1 p(v) |  =0 

oo 

0  <  X  1  p (v>  |  <°° 

l  | P(v) |  -  » 

V— 1 

V=1 

V=1 

Within  short  memory  time  series  there  are  three  types 
whose  classification  in  terms  of  correlation  functions  is  as 
follows : 

MA(q)  p(v)  =  0  for  v  >  q  (note  MA(0)  is  white  noise) ; 

AR(p)  There  exist  ctp  such  that 

p(v)  +  a1p(v-l)  +  .  .  oipp (v-p)  =0,  v  >  0  ; 

ARMA(p ,  q)  There  exist  a-,,  ....  a  such  that 

p(v)  +  a-^p(v-l)  +  .  .  .+app(v-p)  =0,  v  >  q  . 

Examples  of  AR  and  ARMA  are:  AR(1)  p(v)  =  aV,  v  >  0  ; 
p(v)  -  .ap(v-l)  =0,  v  >  0  :  ARMA(1,1)  p(v)  =  yav  ,  v  >  0 
p(v)  -  ap(v-l)  =  0  ,  v  >  1  ,  where  0  <  y  <  1  . 

Within  long  memory  stationary  time  series  some  types  are 

WHITE  BANDLIMITED  NOISE  p(v)  =  . 

PERIODIC  p(v)  =  cos^v  ; 

PERIODIC  PLUS  LTilTE 

NOISE  p(v)  =  ycos^-v  ,  v  >  0  . 

2  it 

The  long  memory  time  series  Y(t)  =  A  cos  — t  +  N(t)  can 
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be  transformed  to  a  short  memory  time  series  by  forming 
Y(t)  =  (I-^L+L^) Y( t)  where  </>  -  2  cos(2n/p)  ;  we  call  this 
operation  second-order  quasi-differencing. 

The  definition  in  terms  of  correlations  is  intended  only 
to  introduce  the  three  time  series  memory  types.  It  is  riot 
our  final  definition,  as  correlations  do  not  provide  adequate 
means  of  identifying  time  series  models. 


2.  MEMORY  TYPE  BY  SPECTRAL  LOG  RANGE 

The  spectral  density  function  f(A),  -0.5  £  A  £  0.5  ,  of  a 
stationary  short  memory  time  series  is  defined  as  the  Fourier 
transform  of  p(v)  : 

oo 

f (A)  *  l  e"2triAv  p(v)  (  _Q  5  <  X  <  0.5  . 

V^-00 

The  spectral  log  range,  and  its  memory  types  are  defined 
by 


SPLR  =  log 


max 

min 


f(A) 

TTT7 


No  Memory 

Short  Memory 

Long  Memory 

SPLR  =  0 

0  <  SPLR  <  °° 

SPLR  =  » 

To  extend  this  definition  to  stationary  time  series  whose 
correlation  function  p(v)  is  not  summable  define,  for  any 
T  >  0  , 

fT(X)  -if  e2”1Xk  p(j-k) 
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which  is  a  non-negative  function  by  the  non-negative  definite 
property  of  p(v)  .  Next 

f  (A)  =  V  e"2irUv  (1--LX1)  p(v)  , 

|v|<T 

(l-M)p(v)  =  /^q55  e2ltiXv  f T ( A )  dA  . 

V/e  study  the  limits  of  these  equations  when  T  — *  °°.  When 
p(v)  is  summable,  f T ( A )  — t  f(A)  _>  0  .  Otherwise,  p(v)  is  the 
limit  of  Fourier  transforms  of  non-negative  functions,  and 
therefore  there  exists  a  spectral  distribution  function  F(A), 
-0.5  £  A  _<  0.5  ,  such  that 

p(v)  =  J_o  5  e2TTLXv  dF( A)  . 

It  could  be  argued  that  in  practice  we  are  estimating  not  f^.(A) 
but 

A 

FT(A)  =  /_q  5  f  T  ( A  * )  dA'  —>  F(A)  . 

The  general  definition  of  spectral  log  range  is 


SPLR  = 


lim 

T-*°° 


log 


max 

A 

min 

A 


fT(A) 

fT(A) 


3.  MEMORY  TYPE  BY  PREDICTION  VARIANCE 
The  time  series  memory  type  depends  on  the  amount  of  var¬ 


iation  in  the  spectral  density  f(A),  which  can  be  captured  in 
a  single  criterion  such  as  the  following: 
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°t  =  exP"/_o . 5  loS  f ( A ) dA  .  (1) 

0.5 

Note  that  f(A)  is  normalized  so  that  /_ g  ^  f(A)dA  =  1  .  Then 
2 

0  £  <_  1  since  by  Jensen's  inequality 

2  0.5  j-. 

°co  i  exp  {-log  f_0  5  f  ( A )  d  A }  =  e  =  1  . 

If  f  (a  )  =  1  identically  (no  memory  or  white  noise  time  series) 

2 

then  =  1  .  If  f(A)  approaches  zero  (long  memory  time  series), 

2  2 
then  aro  =  0  .  Otherwise  (short  memory  time  series),  0  <  <  1. 

It  is  shown  in  the  prediction  theory  of  stationary  time  series 
2 

that  defined  by  (1)  equals  the  infinite  memory  one-step 

2 

ahead  mean  square  prediction  error;  we  call  the  innovation 

variance.  The  classification  of  time  series  memory  type  in 
-  2  . 

terms  of  a  is : 

oo 


No  Memory 

Short  Memory 

Long  Memory 

2 

0=1 

0  <  a2  <  1 

a2  =  0 

oo 

oo 

2 

To  determine  the  value  of  ,  one-  uses  the  fact  that 

2  lim  2 

a  =  ,  a 

°°  m— »°°  m 

2 

where  om  is  the  finite  memory  m  one-step  ahead  mean  square  pre- 

2 

diction  error.  To  compute  cm  we  solve  for  . am  the 

Yule-Walker  equations 

p(v)  +  a^p(v-l)  +  ...  +  amp(v-m)  =  0  ,  v  =  1,  2,  ....  tn; 

2 

then  a  =  1  +  a,p(l)  +  ...  +  a  p(m)  . 
mi  m 
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The  coefficients  oy ,  • • • .  am  determine  a  prediction  error 
time  series 

Y(t)  =  g^L)  Y(t)  ,  g^z)  =  1  +  aLz  +  ...  +  amzm  ; 
autoregressive  spectral  density  estimator  of  order  m 


f  (A)  =  a 
mv  '  m 


i  /  2  7t  i  X  N 

|gm<e  ) 


,  0  <  A  <  0.5  ; 


and  autoregressive  spectral  distribution  function  estimator  of 
order  m 


Fm(A)  =  2/q  fm(A’)dA'  ,  0  <  A  <  0.5  . 

A 

From  estimated  correlations  p(v)  ,  one  solves 
p(v)  +  «m(l)  P  (v-1)  +  ...  +  ctm(m)p(v-m)  =  0  ,  v  =  1 . m, 

-A  2  A  A  /\  /\ 

am  =  1  +  am(l)p(l)  +  . . .  +  am(m)p(m)  for  successive  orders 
m  =  1,  2,  ...  (using  fast  algorithms);  one  forms  transfer  func¬ 
tions  g^z)  =  1  +  am(l)z  +  •••  +  am(m)zm  . 

To  choose  an  order  m  such  that  ot  =  o  one  computes  order 

m  1:0 

determining  functions,  such  as  AIC  or  CAT,  whose  absolute  min¬ 
imum  and  relative  minimum  are  used  to  determine  orders  m  of 
autoregressive  estimators  to  be  considered  as  "optimal". 

Akaike  information  criterion  is  [see  Akaike  (1979)] 

AIC (m)  =  log  ^ 

An  alternative  version  of  criterion  is  given  by  Hannan  and 
Quinn  (1979)  . 

Parzen  CAT  (Criterion  Autoregressive  Transfer  Function)  is 
a  measure  of  the  overall  mean  square  relative  error  of  g^  as 
an  estimator  of  gfii,  the  AR(ot)  transfer  function  [see  Parzen 
(1974)  ,  (1977)]  .  To  define  CAT  let  §?  =  a?  which  is  called 


Then 


an  unbiased  estimator  of  o'1 


CAT(m) 


m 


-2 


T  l  d'i 
1  1=1  J 


For  order  0,  define  AIC  =  0,  CAT  =  -1  .  However  if  one 
wants  to  increase  one's  probability  of  correctly  detecting 
white  noise,  one  might  adopt  definitions  such  as  CAT(O) 

-  -<l+i)  ,  AIC (0)  =  . 

In  practice,  identical  conclusions  are  usually  implied  by 

these  two  criterion  functions.  The  best  order  m  is  defined  as 

the  order  at  which  the  criterion  function  is  minimized  (if  m=0 , 

the  time  series  is  considered  to  be  white  noise  or  no  memory) . 

The  second  best  order  m(2)  is  defined  as  the  order  at  which 

the  smallest  relative  minimum  occurs  which  is  not  a  global 

minimum.  The  approximating  autoregressive  scheme  chosen  by 

^2  2 

an  order  determining  criterion,  yields  an  estimate  of  o 
which  provides  a  preliminary  diagnostic  of  the  memory  type  of 
a  time  series.  An  ad  hoc  rule  is: 


No  Memory 

Short  Memory 

Long  Memory 

~2  ,  4 

>  l-~ 
m  i 

otherwise 

A2  8 

am  <  T 

Nonstationary  autoregressive  schemes:  A  parallel  approach 
to  model  identification  is  to  fit  autoregressive  schemes  which 
may  be  non- stationary .  One  minimizes 


l  { Y ( t )  +  a(l)Y(t-l)  +  ...  +  a(m)Y(t-m)} 
t=m+l 
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to  form  estimators  a(l) ,  ....  a(m)  .  Then  form  the  squariance 
T 

S  =  l  { Y(t)  +  a(l) Y(t-l)  +  ...  +  a(m)Y(t-m) }2 
m  t=m+l 

2 

and  estimate  a  (representing  normalized  mean  square  error)  by 

i  i  T  2 

=  Tm  S/m  I  Y  (t)  . 

m  T-m  T 

If  am  is  near  zero,  which  might  be  interpreted  as  indicating 
a  model  which  fits  the  data  well,  one  instead  concludes  that 
the  time  series  is  non-stationary  and/or  long  memory,  and  one 
needs  to  model  the  residuals  Y(t)  =  Y(t)  +  a(l)Y(t-2)  +  ... 

+  a(m)Y(t-m)  .  If  is  near  1,  one  concludes  that  the  time 
series  is  nearly  white  noise,  and  the  coefficients  a(l),  ...» 
a(m)  are  themselves  undoubtedly  not  significantly  different 
from  zero. 


4..  MEMORY  TYPE  BY  SPECTRAL  DISTRIBUTION  FUNCTION 

A  quick  diagnosis  of  the  memory  type  of  a  time  series  can 
be  obtained  from  the  graph  of  the  sample  spectral  distribution 
function  F(A),  0  £  X  £  0.5,  which  together  with  p(v)  is  com¬ 
puted  as  the  first  step  in  our  approach  to  time  series  analysis: 


No  Memory 

Short  Memory 

Long  Memory 

F  Uniform 

F  otherwise 

F  has  sharp 
jumps 

Spectral  distribution  functions  play  an  important  role  in 


judging  the  goodness  of  fit  of  a  model.  The  mathematical  fit 
of  a  model  to  data  can  be  judged  by  the  fit  of  a  "smooth" 
function  representing  the  model  to  a  "wiggly"  function  repre¬ 
senting  the  data.  Spectral  distribution  functions  are,  in  my 
opinion,  ideal  representing  functions. 

The  orders  selected  by  the  minimum  of  an  autoregressive 
criterion  function  should  be  regarded  as  hypotheses;  one  tests 
them  by  how  well  their  autoregressive  spectral  distribution 
function  fits  the  raw  spectral  distribution  function,  as  mea¬ 
sured  by 

F(X)-F(X)  =  /o{o;|gm(e2lliu)r2-f(U)}dU  =  / q ( f (u) - f (u)  } du . 

In  addition,  the  residuals  of  the  autoregressive  filter  are 
tested  for  whiteness  by  examining 

/o(J7l8n,(e2'1U)|2*(u>-1>du  '  ^ 0  f(U>-:f(U)  du  • 

u  f (u) 

5.  MEMORY  TYPE  AND  ARMA  TYPE  BY  PREDICTION  VARIANCE  HORIZON 

An  extremely  useful  function  for  identification  of  a  time 
series  model  type  before  parameter  estimation  is  the  prediction 

variance  horizon  PVH(h) ,  h  =  1,  2 .  It  is  defined  in  terms 

of  the  normalized  mean  square  prediction  error  of  infinite 
memory  prediction  h  steps  ahead: 

°h,oo“  E[|Yv(t+h|t)|2]  f  E  { Y2  ( t)  ]  , '  YV  (t+h  1 1)  =  Y(t)-Yu(t+h|t), 


Yu(t+h | t)  =  E [ Y(t+h) ! Y(t) ,  Y(t-l) ,  ...] 
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2 

°h,c 


A  formula  for  c£  is  obtained  by  introducing  the  MA(°°)  repre¬ 


sentation  of  Y(t):  Y(t)  =  e(t)  +  3, e(t-l)  +  . . . 


Then 


o?  =  o2{ i  +  ef  + 

n ,  co  oo  l 


+  Vi* 


2  2 
The  graph  of  o,  m  increases  monotonically  from  om  at  h  =  1  to 

t 

1  as  h  tends  to  ®  .  We  define 


PVH(h)  =l-a 


h.< 


h  =  1,  2, 


and  define  horizon  HOR  to  be  the  smallest  values  of  h  for  which 
PVH(h)  £0.05  (whence  o^  m  £  .95)  . 

The  infinite  moving  average  coefficients  3^  are  estimated 
by  inverting  the  transfer  function  g^z)  of  an  approximating 
autoregressive  scheme  to  obtain,  for  k  =  1,  2,  ... 


a06k  + 


al6k- 


akS0 


0  . 


The  value  of  the  horizon  HOR,  and  the  shape  of  PVH,  can  be 
used  to  determine:  (1)  the  memory  type  of  a  time  series,  and 
(2)  if  it  is  short  memory,  whether  its  model  should  be  AR,  MA, 
or  ARMA.  A  long  horizon  indicates  a  long  memory  time  series. 

As  an  example,  suppose  one  fits  an  AR(1) :  Y(t) -pY(t-l)=e (t) 
when  p  =  1  the  time  series  is  considered  long  memory  since 
a2  =  1  -  p2  ±  0  ,  PVH(h)  =  p2h  ,  HOR  =  (log  .05)/2  log  p  *  °°  . 
When  p  *  0  ,  the  time  series  is  considered  no  memory.  When 
0  <  p  <  1,  the  time  series  is  considered  short  memory. 

The  classification  of  memory  type  by  prediction  horizon 
HOR  is: 


No  Memory 

Short  Memory 

Long  Memory 

HOR  ±  0 

0  <  HOR  <  00 

HOR  *  °° 
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By  HOR  *  “ ,  we  mean  HOR  is  comparatively  large:  experiments 

lead  us  to  conclude  that  one  should  compare  HOR  with  the  order 
ORD  of  the  approximating  autoregressive  scheme.  Let  HOR/ORD 
denote  the  ratio  of  HOR  to  ORD;  identify  time  series  as 
follows:  If  HOR/ORD  <  1  ,  then  MA(q) ,  with  q  <  HOR-1 .  If 

HOR/ORD  2l  4(say),  and  PVH  decays  slowly,  then  long  memory.  If 
PVH  declines  smoothly  and  exponentially,  then  an  AR(p)  is  in¬ 
dicated.  If  PVH  has  "bends”,  then  ARMA.  If  PVH  has  many 
level  stretches  with  period  t,  then  an  ARMA  model  is  indicated 
of  the  form 


I+8-,  L+0oL^+ .  .  .+6  Lq 

Y(t)  =  - ± - ±— - 9—  e(t)  . 

I-a  LT 


The  final  identification  of  the  orders  p  and  q  should  be  by 
parameter  estimation  or  by  use  of  S-arrays. 


MEMORY  TYPE  AND  ARMA  TYPE  BY  S-ARRAYS 


Gray,  Kelley,  and  Mclntire  (1978)  define  a  double  sequence 


S- Array :  Sm(-q) , 


*  0°>’  *1™ . Vq>- 


m'  '  '  ~m' 

m  =  1 ,  2,  ...,  p  ,  whose  constancy  patterns  are  in  1-1  corres¬ 
pondence  with  ARMA  schemes,  and  more  generally  with  character¬ 
istics  of  sequences  satisfying 


p(v)  +  o^pCv-l)  +  ...  +  app(v-p)  =  0  ,  v  >  q  . 

/n\ 

The  generalized  partial  correlation  is  defined  by  ‘/“-am(m)  , 
where  am(m)  is  the  value  of  the  last  autoregressive  coefficient 
in  an  ARMA(m,n)  scheme  obtained  by  solving  higher  order  Yule 
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Walker  equations 

p(j)  +  am(l)p(j-l)  +  . . .+  am(m)p(j-m)  =0,  j  =  m  +  1,  .  .  .  n+m. 

Woodward  and  Gray  (1980)  show  that 

l"m(n)|  =  l5*(n)/S^(-(n+l>) |  ,  n=0,  1,  ...  . 

The  display  which  we  call  S-PLAY  combines  various  statisti- 

•k 

cal  diagnostics  with  a  new  way  of  displaying  S  -arrays.  For 
frequency  X  (equal  to  0  or  0.5)  S-PLAY  displays  for  each  auto¬ 
regressive  order  p  =  1 ,  2,  ... 

V  Innovation  variance  o 

P 

PL  Partial  correlation  I  tt  I 

p 

-AIC  The  value  of  -AIC(p) 

a 

SPEC  Autoregressive  spectral  estimator  fp(X)  at 
frequency  X . 

GSP1 ,  GSP2  G-spectral  estimators  at  frequency  X 

P2D  Minimum  value  of  2nd  differences  of  generalized 
partial  correlations  2it'  +  ^p  » 

value  of  n  at  which  minimum  attained  is  printed  as 
value  of  symbol  M.  If  one  adopts  p  as  autoregres¬ 
sive  order,  one  might  consider  M  as  moving  average 
order. 

It  should  be  noted  that  the  rows  of  S-PLAY  are  the  columns 
of  the  shifted  S-Array  as  defined  by  Woodward  and  Gray  (1980) . 

The  constancy  patterns  in  the  S-array  which  are  character¬ 
istic  of  an  ARMA(p,q)  are  described  below,  as  are  the  patterns 
characteristic  of  long  memory.  S-PLAY  provides  diagnostics 
which  can  be  used  to  confirm  our  conclusions  about  a  time  series 
model  type  as  follows: 

_ _  J 


Short  Memory 


No  Memory 

Infinities  in 
Column  -  1 


ARMA(p,q)  if 
row  p,  column  q, 
column  -  (q-f-I ) 
have  constant 
values  as  des¬ 
cribed  in  (a) 
and  (b) 


Long  Memory 

Constant  row  1 
(trend) 

Constant  row  2 
(seasonal) 


(a)  C2  =  Sp (- (q+1) )  =  Sp (- (q+2) )  =  ...  and  Cx  =  Sp(q) 

=  Sp(q+1)  =  ...  implies  that  irp^^  =  irp^+^  =  ...  ;  in  words, 
if  one  solves  for  the  coefficients  of  a  p-th  order  autoregres¬ 
sive  scheme  using  high  lag  Yule-Walker  equations  of  lag  q, 
q+1,  ....  one  obtains  the  same  value  for  the  p-th  coefficient 
ap(p)  from  all  of  these  sets  of  equations. 

(b)  -CL  =  Sq(p+1),  Cl  =  Sq(p+2),  ...,  (-1)1  Cx  =  Sq(p+i) , 

+  °°  =  Sp+i(-(q+l))  for  i  >_  1  ,  implies  that  irp^  =  irp^2  =...=0; 
in  words,  if  one  solves  (for  i=l,  2,  ...)  for  the  coefficients 
of  a  (p+i)-th  order  autoregressive  scheme  using  high  lag  Yule- 
Walker  equations  of  lag  q,  one  obtains  value  0  for  the  last 
coefficient  ap+i(P+i)  • 

It  should  be  noted  that  experience  and  judgement  is  re¬ 
quired  to  decide  when  a  pattern  of  approximate  constancy  exists 
in  an  S-array  of  an  observed  time  series.  Further  the  failure 
of  such  patterns  to  exist  should  be  expected  as  most  time 
series  are  not  exactly  an  ARMA  process,  but  at  best  are  approx¬ 
imately  modelled  by  an  ARMA  process. 


7.  CASE  STUDY  OF  AN  EMPIRICAL  TIME  SERIES  ANALYSIS 


The  time  series  Y,  called  Freeze,  and  graphed  in  Fig.  27, 
represents  minimum  temperatures  (in  degrees  centrigrade)  over 
10  day  intervals  (with  some  11  day  intervals) .  There  are  36 
values  per  year.  The  sample  size  T  =  992. 

Step  1 .  Autoregressive  Analysis  of  Y 

Compute  sample  mean  Y  =  5.02,  sample  variance  R(0)  =27.4  . 
For  Y(t)-Y  ,  compute  sample  spectral  density  f(A),  sample  cor¬ 
relations  p(v)  and  the  sample  spectral  distribution  function 
F(A)  .  p(Fig.  2)  has  a  strictly  periodic  appearance,  with 

a 

periodicity  36.  Some  noteworthy  maximum  values  are  p(l)  =  .7163, 
p ( 36 )  =  .6765,  p  (  .  73)  =  .6314.  F(Fig.  1)  rises  linearly  except 
for  a  sharp  jump  of  .66  at  a  frequency  corresponding  to  a  period 

A 

of  about  36.  The  character  of  p  and  F  leads  us  to  suspect  a 
model  of  periodic  signal  S(t)  plus  white  noise  N(t) : 

Y(t)  =  S(t)  +  N(t)  ,  (1) 

S(t)  =  A  cos  ^-t  +  3  sin  ^-t  ,  p  =  36  .  (2) 

A  model  of  quasi-periodic  signal  S(t)  would  be 

S(t)  -  (j)S(t-l)  4-  S(t-2)  =  6  (t)  white  noise  (3) 

where  <t>  =  2  cos  2tt/p  =  1.97  for  p  =  36  .  The  variance  of  N(  ) 

is  approximately  30%  of  the  variance  of  Y(.)  . 

Next  one  solves  Yule-Walker  equations  in  p(v)  for  successive 

orders  m  =  1,  2 . M,  where  M  is  chosen  in  the  range  T/10 

<_  M  £  3T/4  ,  depending  on  the  size  of  T.  Here  M  =  80.  Graphs 
of  log  ,  AIC,  and  CAT  are  displayed  in  Figs.  3,  4.  CAT 

chooses  m  =  36 ,  m(2)  =  39  as  best  and  second  best  orders.  It 
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should  be  noted  that  for  Freeze  AIC  and  CAT  are  unusually  flat 
in  the  vicinity  of  their  minimum  order.  The  prediction  variance 
horizon  (Fig.  5)  indicates  Freeze  is  a  long  memory  time  series; 
the  order  1  rows  of  S-PLAY  indicate  long  memory;  the  innovation 
variance  =  .32  does  not  indicate  long  memory.  The  auto¬ 
regressive  spectral  density  estimator  £( A)  of  order  36  in  Fig. 

6  and  second  best  order  39  in  Fig.  8  indicates  a  sharp  peak 
at  frequency  1/36  (with  period  36);  Fig.  7  shows  that  F  matches 
F. 

Step  2.  Analysis  of  Autoregressive  Residuals 

The  residuals  Y(t)  =  g^dOY^)  of  the  AR(36)  scheme  yield 
the  conclusion  that  they  are  white  noise  under  autoregressive 
analysis  (Figs.  9-16). 

Step  3.  Transformation  of  Long  Memory  Y  to  Short  Memory  Y 
When  Y  has  a  long  horizon,  we  seek  to  find  a  suitable 
transformation,  to  short  memory;  we  choose 

Y(t)  =  g2(L)Y(t)  ,  g2(L)  =  I  -  1.97L  +  L2  . 

Its  variance  42.7  is  larger  than  that  of  Y(t),  which  one  can 
explain  by  the  approximate  calculation  Var  (Y(t)  -  2Y(t-l) 

+  Y(t-2)]  =  6R(0)  =  8R(1)  +  2R(2)  .  The  raw  spectral  distri¬ 
bution  function  (Fig.  18)  of  Y(-)  suggests  high  frequency 
content.  Its  correlation  function  (Fig.  17)  suggests  a  MA(2) 
model  since  p(l)  =  -.6405,  p(2)  =  .1211  and  these  are  the 
only  correlations  much  greater  than  .0  6*  2//T  . 

The  AIC  and  CAT  determined  (Fig.  19,  20)  optimal  autore¬ 
gressive  approximating  order  for  Y  is  m  *  15  ;  note  that  AIC 

-  2 

is  much  flatter  than  is  CAT  near  the  minimum  =*  .  24  . 


PVH  (Fig.  21)  decreases  very  quickly  Co  zero,  and  HOR  =  3  sug¬ 
gests  MA(2)  .  S-PLAY  does  not  seem  to  yield  definite  conclu¬ 
sions,  as  it  should  not  when  the  true  model  is  a  pure  moving 
average. 

The  model  g^(L)Y(t)  =  e(t)  yields  one-step  ahead  predic¬ 
tions  with  average  squared  error  8.7  whose  ratio  to  the  var¬ 
iance  of  Y(t)  is  .32,  exactly  equal  to  the  prediction  variance 
of  the  AR(36)  scheme  fitted  to  Y(t) .  One  can  conclude  that 
the  iterative  analysis 

Y  Y  -  g2Y  e  =  g15Y  =  g15g2Y 

has  resulted  in  an  AR(17)  filter  with  the  same  forecasting 
(and  the  same  spectral  estimation)  properties  as  the  AR(36) 
model  g^^(L)Y  =  e  .  As  a  model  the  iterated  AR(2) ,  AR(15) 
model  has  more  insight  than  the  AR(36)  model. 

Step  4.  MA  Analysis  of  Y 

The  model  Y(t)  =  e(t)  +  B^e(t-l)  +  &2e(t_2),  an  MA(2)  , 
suggested  by  the  prediction  variance  horizon  function  and 
correlations  of  Y(-),  needs  to  be-confirmed  and  its  parameters 
estimated.  An  alternative  approach  to  identifying  the  ARMA 
type  is  to  examine  the  successive  models  for  Y  determined  by 
our  subset  ARMA  algorithm.  The  first  model  is  MA(1) ,  the 
second  model  is  MA(2) .  Thereafter  various  ARMA  models  are 
determined  whose  residual  variances  do  not  decrease  much.  Con¬ 
sequently  we  fit  Y(t)  by 

Y (t)  =  e (t)  -  1.6475  e(t-l)  +  .6382  e(t-2) 


with  residual  variance  .26  (compared  with  .24  for  AR(15)). 
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One  concludes  that  an  overall  model  for  Y  is  an  ARMA(2,2)  model; 
estimators  of  its  parameters  are 

Y(t)-1.97Y(t-l)+Y(t-2)  =  e(t)-1.6475e(t-l)+.6382e(t-2)  . 

The  spectral  density  of  the  MA(2)  model  for  Y  is  in  Fig.  23. 

The  spectral  density  of  the  ARMA  model  for  Y  is  in. Fig.  24. 

Step  5.  Signal  Plus  Noise  Interpretation  of  ARMA  Model 

While  the  ARMA(2,2)  model  may  not  be  most  satisfactory  for 
forecasting,  or  for  spectral  estimation,  it  is  most  satisfac¬ 
tory  for  insight.  It  suggests  that  the  original  time  series 
Y(t)  satisfies  (1)  and  (3),  since  6(t)  +  N(t)  -  1.97  N(t-l) 

+  N(t+2)  defines  an  MA(2)  . 

A  question  which  the  Freeze  time  series  may  illustrate 
is  the  question  of  how  well  autoregressive  spectral  estimators 
do  when  applied  to  data  which  is  a  moving  average.  Fig,  25 
displays  window  estimators  of  the  spectral  density  of  Y(t) , 
using  a  Parzen  window  with  truncation  points  16,  32,  64;  the 
best  of  these  truncation  points  is  32,  and  it  agrees  with  the 
AR(15)  spectral  estimator  in  Fig.. 22.  The  periodogram  being 
smoothed  is  shown  in  Fig.  26. 

Summary 

The  models  we  have  fitted  to  Y(t)  are  as  follows: 

Steps  1,2:  AR(36)  g36(L) (Y(t)-?}  =  e(t)  , 

Step  3:  AR(2) ,AR(15)  g2(L) ( Y ( t ) - Y }  =  Y(t)  , 

g15(L)Y(t)  =  e (t) 

AR(2) ,MA(2)  g2(L){Y(t)-Y)  =  Y(t)  , 

Y(t)  =  h2(L)e(t) 


Step  4: 
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Step  5:  Y ( t)  =  Y  +  S(t)  4-  N(t),  S(t)  -  1 . 97S (t-1) 

+  S(t-2)  =  S(t)  . 


COEFFICIENTS  ou  OF  AR(36)  MODEL  EctjY(t-j)  =  e(t) 
FOR  FREEZE  SERIES  Y(t) ,  t  =  1,  ....  992 


1-5 

-0.1897 

-0.0789 

-0.0831 

-0.0774 

0.0082 

6-10 

-0.0457 

-0.0052 

-0.0385 

0.0185 

0.0522 

11-15 

0.0095 

-0.0103 

0.0337 

0.0888 

0.0373 

16-20 

0.0311 

0.0400 

-0.0047 

0.0358 

0.0230 

21-25 

-0.0196 

0.0534 

0.0493 

0.0341 

-0.0178 

26-30 

-0.0138 

C . 0234 

0.0077 

-0.0371 

-0.0117 

31-35 

-0.0708 

-0.0319 

0.0131 

-0.0722 

-0.0410 

36 

-0.0456 

Residual 

Variance  = 

0.32125 

COEFFICIENTS 

a j  AR(15)  MODEL 

IcijY(t-j)  = 

e(t) 

FOR  TRANSFORMED  FREEZE  SERIES  Y(t) 

=  Y(t)  -  1. 

97Y(t-l)  +  Y(t-2) 

1-5 

1.6390  . 

2.0567  2.2628 

2.2869 

2.2303 

6-10 

2.0593 

1.8400  1.5492 

1.2572 

1.0049 

11-15 

0.7548 

0.4939  0.2857 

0.1349 

0.0482 

Residual 

Variance  = 

0.24223 

Successive 

Subset 

ARMA  Models  for 

Y(t) 

were : 

Y(t)  = 

e(t) 

-  1.647  e (t-1) 

Y(t)  * 

£  (t) 

-  1.647  e ( t-1)  + 

.638 

e  (t-2) 

Y(t)  * 

e(t) 

-  1.647  e (t-2)  + 

.638 

e (t-2)  -  .059  Y 

Residual  variances  are  .359,  .262,  .259. 
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8.  CONCLUSIONS 

The  final  model  fitted  to  a  time  series  will  often  be  an 
iterated  model  (with  symbolic  transfer  functions  G  and  g^) 

Y(t)  —  G  — »  Y(t)  —  g^  — ►  e(t)  white  noise 

where  Y(t)  is  the  results  of  a  transformation  chosen  to  trans¬ 
form  a  long  memory  to  a  short  memory  one.  One  should  always 
analyze  the  residuals  of  an  approximating  autoregressive  filter 
to  determine  if  they  are  white  noise. 

Autoregressive  analysis  by  Yule-Walker  equations  yields  a 
stationary  autoregressive  scheme;  a  non-stationary  autoregres¬ 
sive  scheme  may  be  fit  by  estimating  its  coefficients  by  or¬ 
dinary  least  squares.  Parzen  (1981)  introduces  the  terminology 
ARARMA  scheme  for  the  iterated  time  series  model  with  G  non- 
stationary  autoregressive,  and  ARMA;  an  ARIMA  scheme  cor¬ 
responds  to  a  pure  differencing  operator  for  G. 

A  model  frequently  fitted  to  monthly  economic  time  series 
is  the  so-called  "airline"  model  [see  Parzen  (1979)]: 

(I-L)(I-L12)  ’Y(t)  =  (I-01L)(I-G12L12)  e(t)  . 

It  seems  doubtful  that  this  model  would  be  judged  adequate  by 
the  criteria  proposed  in  this  paper. 

It  may  happen  that  long  memory  components  continue  to  be 
present  even  after  several  iterations;  thus  the  final  model 
might  be  of  the  form 


The  iterated  filter  model  can  be  used  for  forecasting,  for 
spectral  analysis,  and  for  model  interpretation. 
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